60 research outputs found
A General Pipeline for 3D Detection of Vehicles
Autonomous driving requires 3D perception of vehicles and other objects in
the in environment. Much of the current methods support 2D vehicle detection.
This paper proposes a flexible pipeline to adopt any 2D detection network and
fuse it with a 3D point cloud to generate 3D information with minimum changes
of the 2D detection networks. To identify the 3D box, an effective model
fitting algorithm is developed based on generalised car models and score maps.
A two-stage convolutional neural network (CNN) is proposed to refine the
detected 3D box. This pipeline is tested on the KITTI dataset using two
different 2D detection networks. The 3D detection results based on these two
networks are similar, demonstrating the flexibility of the proposed pipeline.
The results rank second among the 3D detection algorithms, indicating its
competencies in 3D detection.Comment: Accepted at ICRA 201
Robust 6D Object Pose Estimation by Learning RGB-D Features
Accurate 6D object pose estimation is fundamental to robotic manipulation and
grasping. Previous methods follow a local optimization approach which minimizes
the distance between closest point pairs to handle the rotation ambiguity of
symmetric objects. In this work, we propose a novel discrete-continuous
formulation for rotation regression to resolve this local-optimum problem. We
uniformly sample rotation anchors in SO(3), and predict a constrained deviation
from each anchor to the target, as well as uncertainty scores for selecting the
best prediction. Additionally, the object location is detected by aggregating
point-wise vectors pointing to the 3D center. Experiments on two benchmarks:
LINEMOD and YCB-Video, show that the proposed method outperforms
state-of-the-art approaches. Our code is available at
https://github.com/mentian/object-posenet.Comment: Accepted at ICRA 202
DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation
Category-level object pose estimation involves estimating the 6D pose and the
3D metric size of objects from predetermined categories. While recent
approaches take categorical shape prior information as reference to improve
pose estimation accuracy, the single-stage network design and training manner
lead to sub-optimal performance since there are two distinct tasks in the
pipeline. In this paper, the advantage of two-stage pipeline over single-stage
design is discussed. To this end, we propose a two-stage deformation-and
registration pipeline called DR-Pose, which consists of completion-aided
deformation stage and scaled registration stage. The first stage uses a point
cloud completion method to generate unseen parts of target object, guiding
subsequent deformation on the shape prior. In the second stage, a novel
registration network is designed to extract pose-sensitive features and predict
the representation of object partial point cloud in canonical space based on
the deformation results from the first stage. DR-Pose produces superior results
to the state-of-the-art shape prior-based methods on both CAMERA25 and REAL275
benchmarks. Codes are available at https://github.com/Zray26/DR-Pose.git.Comment: Camera-ready version accepted to IROS 202
SynTable: A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop Scenes
In this work, we present SynTable, a unified and flexible Python-based
dataset generator built using NVIDIA's Isaac Sim Replicator Composer for
generating high-quality synthetic datasets for unseen object amodal instance
segmentation of cluttered tabletop scenes. Our dataset generation tool can
render a complex 3D scene containing object meshes, materials, textures,
lighting, and backgrounds. Metadata, such as modal and amodal instance
segmentation masks, occlusion masks, depth maps, bounding boxes, and material
properties, can be generated to automatically annotate the scene according to
the users' requirements. Our tool eliminates the need for manual labeling in
the dataset generation process while ensuring the quality and accuracy of the
dataset. In this work, we discuss our design goals, framework architecture, and
the performance of our tool. We demonstrate the use of a sample dataset
generated using SynTable by ray tracing for training a state-of-the-art model,
UOAIS-Net. The results show significantly improved performance in Sim-to-Real
transfer when evaluated on the OSD-Amodal dataset. We offer this tool as an
open-source, easy-to-use, photorealistic dataset generator for advancing
research in deep learning and synthetic data generation.Comment: Version
- …